AirOps, an AI-powered platform focused on managing and generating content at scale, announced a $15.5 million Series A funding round to expand its operations. Initially offering broad LLM-based tools for businesses, AirOps shifted focus to content generation and optimization after identifying this as a key area where AI can add value. The platform allows companies to use various language models to create text and images while maintaining quality through guardrails and human oversight. With the rise of LLM-generated content on the web, AirOps aims to help businesses produce consistent, brand-aligned content efficiently. The round was led by Unusual Ventures, with participation from other investors.
OpenAI introduced "canvas," a new interface for ChatGPT that provides users with a separate workspace for writing and coding projects, allowing for easier interaction and collaboration with the AI. This feature, available to ChatGPT Plus and Teams users in beta, enables users to generate text or code directly in the canvas and make specific edits by highlighting sections. Similar to competitors like Anthropic’s Artifacts, canvas offers a more practical, editable workspace for refining AI-generated outputs without reworking entire prompts. For coding, users can add comments, review code, and ask the AI for explanations or fixes. OpenAI aims to enhance user productivity and expand its paid user base with this feature, which will eventually roll out to free users as well.
ElevenLabs, a two-year-old AI startup specializing in synthetic voice generation for audiobooks and video dubbing, is reportedly raising a new funding round that could value the company at up to $3 billion. The company has seen rapid growth, with its annual recurring revenue (ARR) rising from $25 million to $80 million within the past year. Investors are eager to get involved, with some offering high valuations to secure a stake. Despite the interest, the valuation multiple is slightly lower due to the volatility of consumer revenue, which forms a significant part of ElevenLabs' income. This new round would be the company's third in a little over a year, following its Series B in January, co-led by Andreessen Horowitz. ElevenLabs competes in a growing market against major players like Google and OpenAI, but stands out with its human voice cloning capabilities.
Microsoft has expanded its Copilot AI capabilities, introducing new features for Windows users, including Copilot Vision, a tool that can analyze and respond to what's on your screen. Copilot Vision, currently in a U.S.-only preview, allows users to interact with web pages through Microsoft Edge by answering questions or assisting with tasks, all while ensuring privacy by deleting data after interactions. New features like "Think Deeper" and "Copilot Voice" enhance reasoning and conversational capabilities. Additionally, Microsoft is introducing personalization settings that tailor Copilot’s responses based on user preferences. This rollout continues Microsoft's push to integrate AI across its products, building on its enterprise-focused Copilot tools, while navigating concerns around data privacy and legal issues.
Google's AI-powered note-taking and research assistant, NotebookLM, originally launched at the I/O developer conference and expanded globally, has introduced new features aimed at broadening its use. Initially popular among educators and learners, the tool is now attracting business professionals, with features like AI-generated audio discussions and summaries of YouTube videos and audio files. Users can now share Audio Overviews via public URLs and summarize key points from various media formats. Powered by Google's Gemini 1.5 Pro model, NotebookLM’s updates are driven by user feedback, and the tool's privacy-focused design ensures user data is not used for AI training. Despite challenges like potential oversimplification, Google is actively expanding its functionality and considering mobile apps for next year.
The article reviews Plaud.AI’s NotePin, a wearable device designed to streamline note-taking and transcription for professionals who frequently attend meetings. Traditional methods like typing notes or recording with phones and laptops often disrupt conversations or produce unreadable notes. Plaud.AI offers an efficient alternative with NotePin, which magnetically attaches to a user’s clothing and records conversations at the tap of a button. The recordings are stored in real-time on the user’s phone, and AI-powered tools transcribe and summarize them. While similar to services like Otter.AI, Plaud's deliberate design and ease of use aim to solve real-world issues for people who need accurate and efficient transcription, though its long-term success depends on whether enough users face similar challenges.
Sasha Luccioni, a leading AI researcher recognized by Time magazine, has raised concerns about the high energy consumption of AI models, particularly generative ones like ChatGPT and Midjourney. Unlike traditional search engines, these models create new content, consuming up to 30 times more energy. Speaking at the ALL IN Artificial Intelligence conference, Luccioni highlighted that AI and cryptocurrency used about 2% of global electricity in 2022, and this figure is expected to rise. To address this, she helped develop CodeCarbon, a tool to track the carbon footprint of code, and is working on an energy-efficiency rating system for AI models. Despite tech giants like Google and Microsoft pledging carbon neutrality, their emissions have surged due to increased AI use. Luccioni advocates for "energy sobriety," urging responsible use of AI to mitigate its environmental impact.
Mistral AI, a Paris-based AI startup recently valued at $6 billion, provides advanced AI models for developers. To attract more users, the company launched a free tier allowing developers to fine-tune and build test apps with its AI models via its API platform, la Plateforme. This move follows a growing trend among AI providers, such as OpenAI and Google, offering more for less to compete in the commoditizing AI space. Mistral also slashed prices for accessing its models and added image processing capabilities to its chatbot, le Chat, with its new multimodal AI model Pixtral 12B.
Jacob Jackson, co-founder of AI coding assistant Tabnine, has launched Supermaven, a new AI coding platform featuring the in-house model Babble with a large 1 million-token context window for improved accuracy and reduced hallucinations. Supermaven offers lower latency and fast processing of large codebases, competing with tools like GitHub Copilot and Google’s Code Assist. Despite legal and ethical concerns around code privacy and copyright, over 35,000 developers use Supermaven, which recently raised $12 million in funding to expand its team and continue development. The company has grown rapidly since its February launch, reaching $1 million in annual recurring revenue.
Runway has launched an API to integrate its generative AI video models, starting with Gen-3 Alpha Turbo, into third-party platforms, with pricing at one cent per credit. Despite competition from major players like OpenAI and Google, Runway is one of the first to offer video-generation through an API. However, legal concerns persist about its training data, as reports suggest Runway may have used copyrighted content without permission. This raises broader questions about AI video tools and their impact on industries like film and TV, where AI adoption is already disrupting jobs.
OpenAI’s o1 models pause to "think" before answering, focusing on complex reasoning. The model is four times more expensive than GPT-4o and lacks tools, speed, and multimodal abilities. It's designed for big questions, but OpenAI admits GPT-4o is better for most tasks. The model excels in multi-step reasoning, breaking down large problems into small steps. ChatGPT o1 adds hidden "reasoning tokens" to its cost due to the internal process of thinking. The model is helpful for planning complex tasks but overthinks simple questions. Rumors linked o1 to AGI, but OpenAI confirmed it is not AGI. Many experts agree o1 is not the revolutionary step forward that GPT-4 represented. The model can question users' reasoning on big decisions but is not a decision-maker. The key question remains whether its reasoning ability justifies its higher cost.
Verse is an AI-powered iOS app that allows users to create multimedia content on an interactive canvas. Users can design mini websites, called Verses, for moodboards, greeting cards, fan pages, blogs, and more. The app was founded by Bobby Pinckney and Michelle Yin, who previously co-founded the music app Discz. Verse offers a multimodal canvas for adding photos, videos, GIFs, songs, links, and more, unlike static platforms like Canva or Wix. Verse’s AI assistant, powered by LLMs from OpenAI, Anthropic, Meta’s Llama, and Mistral, helps guide users through the creation process. The app allows users to share their Verses as mini websites across social media platforms. Verse is designed for creative self-expression, offering a medium for users who prefer not to make video content like TikToks. The platform is also used by brands and artists, like Kenya Grace and Lunar, for marketing and immersive content creation. The app includes a social aspect, allowing users to browse and comment on Verses created by others in categories like Music, Art, and Gaming. Verse is free, but the company may introduce a subscription model and plans to expand to Android and the web.
Adobe's AI video generation model, Firefly, will be available in Premiere Pro beta and a free website by the end of 2024. Three Firefly features — Generative Extend, Text to Video, and Image to Video — are currently in private beta and will be released publicly soon. Generative Extend allows users to extend video clips by two seconds, and will be added to Premiere Pro later this year. Text to Video and Image to Video generate five-second videos from prompts or images and will be available on Firefly's website. Adobe plans to expand the time limit for Text to Video and Image to Video features over time. Adobe aims to differentiate its AI tools by providing more control and integration with existing workflows. Firefly's generative fill feature in Photoshop is one of Adobe's most frequently used tools. Pricing for Firefly’s AI video features hasn’t been disclosed, but users may receive generative credits depending on their Creative Cloud subscription. Generative Extend predicts additional footage based on the last few frames, but does not replicate human voices or music to avoid licensing issues. Firefly includes safeguards against generating videos with nudity, drugs, alcohol, and public figures, emphasizing ethical considerations.
Anthropic launched Claude Enterprise, a new AI chatbot subscription for businesses, competing with OpenAI's ChatGPT Enterprise. Claude Enterprise allows companies to upload proprietary knowledge for analysis, Q&A, and acting as an AI assistant. Anthropic is catching up to OpenAI, having released features similar to ChatGPT, including mobile apps and a Team plan. Claude Enterprise features a large context window of 500,000 tokens, much higher than ChatGPT Enterprise. It includes Projects and Artifacts workspaces for collaborative content editing and project management. GitHub integration allows engineering teams to sync code repositories with Claude for codebase analysis and development. Businesses can assign a primary owner for managing access, security, and compliance within Claude. Anthropic ensures that Claude Enterprise customer data is not used for model training, similar to ChatGPT. Pricing for Claude Enterprise is undisclosed but higher than the Team plan due to additional features. Early adopters like GitLab and Midjourney have tested Claude Enterprise, but broader adoption is crucial for profitability.
OpenAI launches SearchGPT, an AI-powered search engine designed to revolutionize internet searches. SearchGPT presents results as structured summaries instead of traditional links. SearchGPT includes a feature called "visual answers," though details remain scarce. Currently, SearchGPT is available to 10,000 test users as a prototype. It is powered by the GPT-4 model family and collaborates with third-party partners for enhanced search results. The long-term goal is to integrate SearchGPT's capabilities into ChatGPT. SearchGPT's release poses a potential threat to Google's search engine dominance. OpenAI emphasizes a collaborative approach with news organizations to address content usage concerns. Publishers have control over how their content appears and can opt out of training OpenAI's models. Launching SearchGPT as a prototype helps mitigate inaccuracies and content attribution issues.
Quora's Poe lets users chat with AI assistants and build web apps with them. The new feature is called Previews and requires a $20 monthly subscription. Users can create data visualizations, games and more by giving instructions to chatbots. Previews allows using multiple chatbots and incorporating information from uploaded files. The created web apps can be shared with others via a link. This feature is similar to Anthropic's Artifacts but supports broader AI models and functionalities. Poe recommends chatbots like Claude and GPT-4o for complex app creation with Previews. The launch comes amid controversy over Poe allowing access to paywalled news articles.
AI voice assistants are being replaced with synthetic human voice models. Truecaller allows users to answer calls with their own voice using Microsoft's Personal Voice technology. This feature is available for paid Truecaller users. Users record a short script to create a digital copy of their voice. Truecaller Assistant can greet callers with a digital version of the user's voice. Greetings must clarify that it's a digital voice. Follow-up responses can be customized by users. Microsoft's technology adds watermarks to identify synthetic audio. Truecaller believes this will improve user experience. The feature is rolling out in phases to beta users in selected countries.
OpenAI released GPT-4o, an update to their GPT-4 model. "o" stands for "omni" signifying its ability to process text, speech, and video. It will be rolled out gradually across OpenAI products. Free tier users will get access to text and image features, while paid users will have higher capacity limits. GPT-4o is "natively multimodal" and can understand and generate content in various formats. Developers get an API at a lower cost with double the speed compared to the previous model. Improved visual processing allows GPT-4o to analyze images and answer questions about them. It offers better multilingual support for around 50 languages. For now, audio processing features are limited to trusted partners due to misuse concerns. The update includes a new user interface for ChatGPT, a macOS desktop app, and expanded features for free tier users.
Google's AI model, Gemini, can now process diverse data and is used by millions of developers and Google users. Google Photos lets you search memories with "Ask Photos" using multimodal AI. Gemini 1.5 Pro offers increased context window for developers and consumers. Gemini integrates with Workspace to summarize attachments and meetings. AI upgrades Search with "AI Overviews" for a better user experience. Circle to Search and LearnLM solve problems and provide context for learning with AI. NotebookLM now offers audio outputs for personalized interactive conversations. Trillium TPUs, Google's 6th generation, deliver improved performance. Gemini Nano offers enhanced capabilities processing text and contextual information. Scam alert uses Gemini Nano to identify scam calls in real-time.
OpenAI is developing a search engine feature for ChatGPT. This feature will allow ChatGPT to search the web for answers and provide responses with source citations. This move positions OpenAI as a competitor to Google and other search engines. The envisioned search feature will allow users to ask questions and receive relevant information with sources. The system may also incorporate visuals like diagrams. OpenAI's decision to enter the search engine market is likely due to the increasing demand for advanced chatbot functionalities. Other AI-powered search startups like Perplexity are also gaining traction. Google is also integrating AI into its core search experience. Details about the user interface for the ChatGPT search engine are not yet available. An official unveiling of the new feature is expected soon.
World's first AI beauty pageant, Miss AI, is organized by WAICA and Fanvue. AI-generated contestants will be judged on beauty, technical skills and social media influence. WAICA aims to recognize achievements of AI creators. Traditional pageant criteria and AI proficiency are both considered. Miss AI signifies a big leap forward according to WAICA. Judges include AI creators Aitana Lopez and Emily Pellegrini, pageantry historian Sally Ann Fawcett, and marketing expert Andrew Bloch. Total prizes for Miss AI winners exceed $20,000. Miss AI winners will be announced on May 10th. An online awards ceremony will be held later in May.
Google introduced a new AI-fueled video creation tool called Google Vids. Google Vids will be part of the Google Workspace productivity suite. Google Vids is a video editing, writing and production assistant. Google Vids allows collaboration with colleagues in real time in the browser. Examples of videos include product pitches, training content or celebratory team videos. Google Vids creates a storyboard of the video based on your ideas. You can reorder the video, add transitions, select a template and insert an audio track. Colleagues can comment or make changes along the way. Google Vids is currently in limited testing and will be available for customers with Gemini for Workspace subscriptions. Google Vids helps transform the assets you already have into a compelling video.
OpenAI developed a new AI model called Voice Engine that can clone someone's voice. It requires only a 15-second audio sample and text input to generate natural-sounding speech. Voice Engine has been used in existing OpenAI products like text-to-speech API, ChatGPT Voice and Read Aloud. OpenAI is cautious about widely releasing Voice Engine due to potential misuse. Concerns include the spread of disinformation through synthetic voice. OpenAI wants to develop safeguards before a broader release. No release date for Voice Engine has been announced.
Sora, an AI-based text-to-video generator developed by OpenAI, is scheduled to be released to the public this year. According to Mira Murati, Chief Technology Officer at OpenAI, Sora may be available to the public within a few months. Initial release of Sora was limited to visual artists, designers, and filmmakers, with impressive results. OpenAI plans to integrate audio capabilities into Sora to enhance the realism of generated scenes. Users will have the ability to edit content produced by Sora, recognizing occasional inaccuracies inherent in AI-generated imagery. OpenAI aims to position Sora as a versatile tool for creative expression, allowing users to refine content according to their vision.
Sigmind.ai introduces TrafficFlow, an AI-based software for real-time traffic management in Dhaka. TrafficFlow classifies 25 vehicle categories specific to Bangladesh for enhanced monitoring. Features include vehicle detection, color classification, license plate recognition, and speed estimation. Implemented by various authorities including BEPZA, Dhaka Metropolitan Police, and CAAB. Key features: multi-directional counting, speed estimation, wrong-direction identification, and reporting systems. Utilizes Vision Transformer (ViT) models and deep learning-based Re-Identification tracker for accuracy. BEPZA piloted TrafficFlow for security purposes in Dhaka EPZ and Karnaphuli EPZ. Sigmind.ai aims to democratize AI and has won National ICT and APICTA Awards. Strategic partnerships with Robi Axiata Limited and Vehant Technologies Limited, affiliations with NVIDIA and Google Cloud.
Google search experience is changing with AI powered summaries. This could replace the traditional blue links search results. AI summaries can condense information into bite-sized pieces. This is convenient for busy users but might hurt websites relying on clicks. Niche websites and independent voices could be especially affected. AI bias and the inability to capture nuance are concerns with AI summaries. Over-reliance on AI summaries could limit exploration of the web. Transparency in AI models and user empowerment to access original sources are crucial. High-quality content that goes beyond summaries will remain valuable. The future of search is a balance between AI summaries and open web.